Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10x use case #50

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

10x use case #50

wants to merge 10 commits into from

Conversation

ayeaton
Copy link

@ayeaton ayeaton commented Feb 13, 2019

Hi Alicia,

I wanted to use chromVAR for 10x atac-seq data. I saw that one suggestion was to use the fragments bed file, and alter the functions in chromVAR to treat a column as a barcode. I implemented the changes in my branch and added a small test case. The datafile test_x10_bed.tsv is the first 1,000 rows of the atac seq data from 10x (http://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_5k/atac_v1_pbmc_5k_fragments.tsv.gz).

I changed the following things:

In the get_inputs.R script, I added a function called get_counts_from_x10_beds for the 10x input bed files. I also made some minor changes in the functions readAlignmentFromBed, left_right_to_grglist and getCounts.

@AliciaSchep
Copy link
Contributor

AliciaSchep commented Mar 1, 2019

Thanks for the pull request! Did not see it earlier... will take a look but I suspect it will be a good addition!

Copy link
Contributor

@AliciaSchep AliciaSchep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple initial comments. Hoping to recruit someone with more 10X knowledge to review more for correctness.

R/get_inputs.R Outdated Show resolved Hide resolved
R/get_inputs.R Outdated Show resolved Hide resolved
R/get_inputs.R Outdated Show resolved Hide resolved
tests/testthat/test_get_counts.R Show resolved Hide resolved
@AliciaSchep
Copy link
Contributor

Thanks for the updates @ayeaton. Two things I'm trying to understand: (1) What the function will do with multiple different bed files as input, either with some of the same or different barcodes, and (2) if there is an easy way to assemble the matrix as a Sparse matrix from the get-go (what is done when using RG tags for bams) as that could be good for memory considerations. The current implentation first makes a dense matrix which gets converted to sparse by the call to Matrix. Given that 10x samples are likely to be large it might be better to try to construct the matrix as a sparse matrix from the get-go.

@ayeaton
Copy link
Author

ayeaton commented Mar 3, 2019

Hi Alicia,

Thanks for your comments!

To address your first question, if there are barcodes that are the same across bed files, the user can input different names for each bed file using the colData field in getCounts(). These names will then be appended to the barcodes of that bed file so the names for each barcode will be unique. I added a test case to demonstrate.

To address your second question, I moved the creation of the sparse matrix a little further up in the function, but I'm not sure how to change the portion of the function that relies on GRanges objects to be more memory efficient.

@AliciaSchep
Copy link
Contributor

Thanks for clarification re the multiple files (and new test case!). I will take a closer look at the matrix creation to see if I have a more concrete suggestion for change (but might not get to it for a few days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants